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Abstract 



\l In this work, we introduce multiplicative drift analysis as a suitable 

^ way to analyze the runtime of randomized search heuristics such as 

^T^ evolutionary algorithms. 

We give a multiplicative version of the classical drift theorem. This 
allows easier analyses in those settings where the optimization progress 
Q is roughly proportional to the current distance to the optimum. 

I— ' To display the strength of this tool, we regard the classical problem 

^ how the (1+1) Evolutionary Algorithm optimizes an arbitrary linear 

^ pseudo-Boolean function. Here, wc first give a relatively simple proof 

\^ for the fact that any linear function is optimized in expected time 

0(n log n), where n is the length of the bit string. Afterwards, we 
C show that in fact any such function is optimized in expected time at 

most (1 + o(l))1.39enln(n), again using multiplicative drift analysis. 
We also prove a corresponding lower bound of (1 — o(l))en ln(n) which 
actually holds for all functions with a unique global optimum. 
^ We further demonstrate how our drift theorem immediately gives 

• • natural proofs (with better constants) for the best known runtime 

^ bounds for the (1+1) Evolutionary Algorithm on combinatorial prob- 

lems like finding minimum spanning trees, shortest paths, or Euler 
;h tours. 



* Carola Winzcn is a recipient of the Google Europe Fellowship in Randomized Algo- 
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1 Introduction 



An innocent looking problem is the question how long the (1+1) Evolu- 
tionary Algorithm ((1+1) EA) needs to find the optimum of a given linear 
function. However, this is in fact one of the problems that was most influ- 
ential for the theory of evolutionary algorithms. 

While particular linear functions like the functions OneMax and BiN- 
Val were easily analyzed, it took a major effort by Droste, Jansen and 
Wegener [DJW02] to solve the problem in full generality. Their proof, how- 
ever, is highly technical. 

A major breakthrough spurred by this problem is the work by He and 
Yao [HY01,HY04], who introduced drift analysis to the field of evolu- 
tionary computation. This allowed a significantly simpler proof for the 
linear functions problem. Even more important, it quickly became one 
of the most powerful tools for both proving upper and lower bounds on 
the expected optimization times of evolutionary algorithms. For example, 
see [HY04, GW03, GL06, HJKN08, NOW09, OW] . 

Another great progress was made by Jagerskiipper [Jag08], who com- 
bined drift analysis with a clever averaging argument to determine reason- 
able values for the usually not explicitly given constants. More precisely, 
Jagerskiipper showed that the expected optimization time of the (1+1) EA 
for any linear function defined on bit strings of length n is bounded from 
above by (1 + o(l))2.02en ln(n). 

1.1 Classical Drift Analysis 

The following method was introduced to the analysis of randomized search 
heuristics by He and Yao [HY04] and builds on a result of Hajek [Haj82]. 
When analyzing the optimization behavior of a randomized search heuristic 
over a search space, instead of tracking how the objective function improves, 
one uses an auxiliary potential function and tracks its behavior. 

For example, consider the search space {0, 1}" of bitstrings of length 
n G N.^ Suppose we want to analyze the (1+1) EA (which is introduced as 
Algorithm 1 in Section 3) minimizing a linear function /: {0, 1}*^ — )• M with 



and arbitrary positive weights < wi < • • • < Wn- (Note that we differ from 
previous works by always considering minimization problems. See Section 3 
for a discussion why this does not influence the runtime analysis.) Then this 

^By N :— {0, 1, 2, . . . } we denote the set of integers including zero and by R we denote 
the set of real numbers. 



n 




i=l 
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potential function h: {0, 1}" — )• M can be chosen as 



LfJ 

h{x) = In (^l + '^Xj + ^ 2xj 

i=i i=LtJ+i 



) 



(1) 



Though still needing some calculations, one can show the following (see, 
e.g., [HY04] where a variant of Algorithm 1 is analyzed). Let x E {0, 1}". 
Let y G {0, 1}" be the result of one iteration (mutation and selection) of the 
(1+1) EA started in x. Then there exists a 6 > such that 



Now, classical drift analysis tells us that in expectation after a number of 
h(x)/{5/n) = 0(n log n) iterations, the potential value is reduced to zero. 
But h{x) = implies f{x) = 0, that is, the (1+1) EA has found the desired 
optimum. 

Using drift analysis to analyze a randomized search heuristic usually 
bears two difficulties. The fo'st is guessing a suitable potential function h. 
The second, related to the first, is proving that during the search, / and h 
behave sufficiently similar, that is, we can prove some statement like in- 
equality (2). Note that this inequality contains information about / as well, 
namely implicitly in the fact that y has an at least as good /-value as x. 

A main difficulty in showing that h in (1) is a suitable potential function 
is the logarithm around the simple linear function giving weights one and two 
to the bits. However, since the optimization progress for linear functions is 
faster if we are further away from the optimum, that is, have more one-bits, 
this seems difficult to avoid. 

1.2 Multiplicative Drift Analysis 

We present a way to ease to use of drift analysis in such settings. Informally, 
our method applies if we have a potential function g satisfying 



in the notation above. That is, we require a progress which multiplicatively 
depends on the current potential value. For this reason we call the method 
multiplicative drift analysis. We will see that for a number of problems such 
potential functions are a natural choice. 

This new method allows us to largely separate the structural analysis 
of an optimization process from the actual calculation of a bound on the 
expected optimization time. Moreover, the runtime bounds obtained by 
multiplicative drift analysis are often sharper than those resulting from pre- 
viously used techniques. 




(2) 




(3) 
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1.3 Our Results 



We apply this new tool, multiplicative drift analysis, to the already men- 
tioned problem of optimizing linear functions over {0, 1}". This yields a 
simplified proof of the 0(n log n) bound on the optimization time of the 
(1+1) EA. Similar to the proof using the classical drift theorem, we make 
use of the simple linear function g: {0, 1}"" — )• M, chosen as 

n 

5(x) = ^(l + ^)x.. 

i=l 

This function g serves us as a potential function for all linear functions 
/: {0,1}" M with 

n 

1=1 

and monotone weights Q < wi < • • • < Wn- 

Using parts of Jagerskiipper's analysis [Jag08], we then improve his up- 
per bound on the expected optimization time of the (1+1) EA on linear 
functions to (1 + o(l))1.39enln(n). 

We also give lower bounds for this problem. We show that, in the class of 
all functions with a unique global optimum, the function OneMax (see (8)) 
has the smallest expected optimization time. This extends the lower bound 
of (1 — o(l))enln(n) for the expected optimization time of the (1+1) EA 
on OneMax [DFWIO] to all functions in that class (including all linear 
functions with non-zero coefficients). 

Together with our upper bound, we thus obtain the remarkable result 
that all linear functions have roughly (within a 39% range) the same opti- 
mization time. 

To further demonstrate the strength of multiplicative drift analysis, we 
give straight-forward analyses for three prominent combinatorial problems. 
We consider the problems of computing minimum spanning trees (MST), 
single-source shortest paths (SSSP), and Euler tours. Here, we reproduce the 
results obtained in [NW07] (cf. Theorem 15), in [BBD+09] (cf. Theorem 17), 
and in [DJ07] (cf. Theorem 19), respectively. In doing so, we improve the 
leading constants of the asymptotic bounds. 

2 Multiplicative Drift Analysis 

Drift analysis can be used to track the optimization behavior of a random- 
ized search heuristic over a search space by measuring the progress of the 
algorithm with respect to a potential function. Such a function maps each 
search point to a non-negative real number, where a potential of zero indi- 
cates that the search point is optimal. 
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Theorem 1 (Additive Drift [HY04]). Let S <^R be a finite set of positive 
numbers and let {X*^*)}tgN be a sequence of random variables over SU {0}. 
Let T be the random variable that denotes the first point in time t £ N for 
which = 0. 

Suppose that there exists a constant 6 > such that 

E - \T>t]>6 (4) 

holds. Then 

E[r|xW] 

This theorem tehs us how to Hnk the expected time at which the potential 
reaches zero to the first time the expected value of the potential reaches zero. 
If in expectation the potential decreases in each step by 6 then after X^'^^ /5 
steps the expected potential is zero. Thus, one might expect the expected 
number of steps until the (random) potential reaches zero to beX(o)/5, too. 
This is indeed the case in the setting of the previous theorem. 

In order to apply the previous theorem to the analysis of randomized 
search heuristics over a (finite) search space S, we define a potential func- 
tion /i: 5 — 7- M which maps all optimal search points to zero and all non- 
optimal search points to values strictly larger than zero. We choose the 
random variable X^*-* as the potential /i(a;^*^) of the search point (or popu- 
lation) in the t-th iteration of the algorithm. Then the random variable T 
becomes the optimization time of the algorithm, that is, the number of 
iterations until the algorithm finds an optimum. 

When applying Theorem 1, we call the expected difference between 
/i(x(*-*) and /i(x(*"^^'') the drift of the random process with respect 

to h. We say this drift is additive if condition (4) holds. 

2.1 Ideal Potential Functions for Additive Drift Analysis 

The application of additive drift analysis (Theorem 1) to the runtime anal- 
ysis of randomized search heuristics requires a suitable potential function. 
The following lemma (Lemma 3 in [HY04] ) tells us that if the random search 
points ^ ' , ■ ■ ■ generated by a search heuristic form a homogeneous 

absorbing Markov chain, then there always exists a potential function such 
that condition (4) in Theorem 1 holds with equality; namely the function 
that attributes to each search point the expected optimization time of the 
algorithm starting in that point. 

Lemma 2 ([HY04]). Let S be a finite search space and {x^^^}t£n the search 
points generated by a homogeneous absorbing Markov chain on S. Let T be 
the random variable that denotes the fist point in time t G N such that x^*^ 
is optimal. 
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Then the drift on the potential function (7 : 5 — )• M with 
g{x) := E[T | x^^^ = x] 

satisfies 

E[5(x(*))-5(x(*+i))|r>i] =1. 

In a way, E[r | x^^'^ = x\ is an "ideal" potential function for Theorem 1. 
It satisfies the additive drift condition (4) with equality and results in pre- 
cise upper bound on E[r | x^^^ = x\. However, the previous theorem is not 
directly helpful in the runtime analysis of randomized search heuristics. In 
order to apply the previous theorem, we need to know the exact expected op- 
timization time of a algorithm starting from every point in the search point. 
But with all this known, Theorem 1 does not provide new information. 

Still, the previous theorem indicates that potential functions which ap- 
proximate the expected optimization time in the respective point are good 
candidates likely to satisfy the additive drift condition. In the next section, 
we will see such a potential function suitable for the analysis of the opti- 
mization behavior of the (H-l) Evolutionary Algorithm on linear functions. 

2.2 A Multiplicative Drift Theorem 

The drift theorem presented in this subsection can be considered as the 
multiplicative version of the classical additive result. Since we derive it 
from the original result, it is clear that the multiplicative version cannot be 
stronger than the original theorem. 

Theorem 3 (Multiplicative Drift). Let S M. be a finite set of positive 
numbers with vfiinimum s^^^i' Let -[^^^■^j^^^p^ be a seQuence of random vari- 
ables over S U {0}. Let T be the random variable that denotes the first point 
in time t G N for which X^*) =0. 

Suppose that there exists a constant 6 > such that 

E - I = s]>5s (5) 

holds for allseS with Pr[X(*) = s] > 0. Then for all sq e S with Pr[X(°) = 
So] > 0, 

E[T|x(°) = .o]<^^^^%^. 

d 

Like for the notion of additive drift, we say that the drift of a random 
process {x^^^j^gN with respect to a potential function g is multiplicative if 
condition (5) holds for the associated random variables x^^^ := g{x^^^). 

The advantage of the multiplicative approach is that it allows to use 
potential functions which are more natural. The most natural potential 
function, obviously, is the distance of the objective value of the current 
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solution to the optimum. This often is a good choice in the analysis of 
combinatorial optimization problems. For example, in Section 4 we see 
that the runtimes of the (1+1) EA on finding a minimum spanning tree, 
a shortest path tree, or an Euler tour can be bounded by analyzing this 
potential function. 

Another potential function for which drift analysis has been successfully 
applied is the distance in the search space between the current search points 
and a (global) optimum. 

The typical example for this is the drift analysis for linear functions in 
Section 3, where we use the (weighted) Hamming distance to the optimum 
as potential for all functions of this class. While being more difficult to 
analyze, this approach often gives tighter bounds which are independent of 
range of potential fitness values. 

Note that multiplicative drift analysis applies to all situations where 
previously the so-called method of expected weight decrease was used. This 
method also builds on the observation that if the drift is multiplicative (that 
is, condition (5) holds), then at time t = (l+ln(so/smin))/^ the expected po- 
tential X^*) is at most so/e. Afterwards, various methods (variants of Wald's 
identity in [DJW02, Jag08] and Markov's inequality in [NW07,BBD+09]) are 
used to show that the expected stopping time E[r] is indeed in this regime. 
However, the bounds obtained in this way are not best possible. This is 
demonstrated in Section 3.3 where we replace for the proofs in [Jag08] the 
method of expected weight decrease by the above multiplicative drift theo- 
rem. This results in an immediate improvement of the leading constant in 
the main runtime bound of [Jag08]. 

Proof of Theorem 3. Let : S" — M be the function defined by 

s 



g{s) := 1+ln- 

Let R := g{S) be the image of g and let {Z^^^j^gN be the sequence of 
random variables over ii U {0} given by 



'o ifXW = 0, 

(7(X(*)) otherwise. 



Then T is also the first point in time t G N such that Z^*) = 0. Sup- 
pose T > t. Then we have = g(X(*)) > 0. If = 0, then 
also Z(*+i) = and 

...... 

Otherwise, Z^^^-*^) = g^X^^^'^^) and again 
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where the last inequahty follows from 



which implies 



and thus 



U u — w u-w 

— = 1 H < e ™ 

w w 



, /' u\ u — w 

In - < 

\w/ w 



, ,'w\ w — u 
In - > . 



u/ w 



for all u,w gM. 

Hence, by (6) and (7), independent of whether Z^*"*"^^ = or Z^*^^) ^ 0, 
we have 

Let r £ R. Since g is bijective, there exist a unique s £ S such that 
r = g{s). Moreover, the events Z^*-* = r and X^^^ = s coincide. Hence, we 
have by condition (5) that 

E[Z«) - Z<-'> I Z<') = r] > nx'" - Xi'»> I X"> = s] ^ ^ 

s 

Finally, we apply Theorem 1 for additive drift and obtain for s £ S 
with Pr[X(°) = s] > that 

E[r I = .] = E[r I ZW = g{s)] < ^ < ^ + 

which concludes the proof of the theorem. □ 

In Section 3 and Section 4, we demonstrate the strength of this new 
tool by applying it to four well-known problems: the problem of minimizing 
linear pseudo-Boolean functions, the minimum spanning tree problem, the 
single-source shortest path problem, and the problem of finding Euler tours. 



3 The Runtime of the (1+1) Evolutionary Algo- 
rithm on Pseudo-Boolean Functions 

Many optimization problems can be phrased as the problem of maximizing or 
minimizing a pseudo-Boolean function /: {0, 1}" — )• M where n is a positive 
integer. In the setting of randomized search heuristics, such a function / is 
considered to be a black-box, that is, the optimization process can access / 
only by evaluating it at limited number of points in {0, 1}". 

In this section, we analyze the (1+1) Evolutionary Algorithm ((1+1) EA) 
for pseudo-Boolean functions (Algorithm 1). This algorithm follows the 
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Algorithmus 1: The (1+1) Evolutionary Algorithm ((1+1) EA) with 
mutation rate 1/n for minimizing /: {0, l}" — )■ M. 

1 choose x^^^ E {0, 1}" uniformly at random; 

2 for i = to cx) do 

3 sample y*-*^ € {0, 1}" by flipping each bit in x^*-* with 
probability 1/n; 

4 if /(y(*)) < /(xW) then 

5 I x(*+i) := yW 

6 else 

7 [ x(*+i) := xW 



neighborhood structure imposed by the hypercube on {0, 1}" where two 
points are adjacent if they differ by exactly one bit, that is, if their Hamming 
distance is one. The (1+1) EA successively attempts to improve the so-far 
best search point by randomly sampling candidates over {0, l}" according 
to probabilities decreasing with the distance to the current optimum. 

The optimization time of the (1+1) EA on a function / is the random 
variable T that denotes the first point in time t G N such that /(x*-*^) is 
minimal. 

One elementary linear pseudo-Boolean function for which the optimiza- 
tion time (1+1) EA has been analyzed (e.g., in [Muh92] and [DJW02]) is the 
function OneMax: {0, 1}" — t- N. This function simply counts the number 
of one-bits in x, that is, 

n 

OneMax(x) := |x|i = ^Xj. (8) 

i=l 

Unlike indicated by the name of this function, we are interested in the time 
the (1+1) EA needs to find its minimum. Thus, in the selection step (Step 4) 
of each iteration, the (1+1) EA accepts the candidate solution y^*) if and 
only if the number of bits equal to 1 does not increase. 

Consider the progress AW := OneMax(xW) - OneMax(x(*+i)) of the 
(1+1) EA in the t-th iteration. By construction of the (1+1) EA, A^*) cannot 
be negative. By definition, the number of one-bits x^*) is OneMax(x^*''). 
For each of these one-bits, there is a (l/re)(l — l/n)"~^ > l/(en) chance 
that only this one-bit is flipped when sampling y(*\ thus increasing the 
value of OneMax(x(*)) by one. Hence, 

ErAW|x«l>^^i^^M^^. 

' en 

Thus, multiplicative drift analysis (Theorem 3) immediately gives us the 
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well-known result 

E[roNEMAx] <en(l + lnE[ONEMAx(2;(0))]) =en(l + lng)). 

Another elementary linear pseudo-Boolean function is BinVal. This 
function maps a bitstring to the binary value it represents (where xi repre- 
sents the lowest and Xn the highest bit). 

n 

BinVal(x) = 2'-^Xi. (9) 
1=1 

Again, for A^*) := BinVal(x(*)) - BinVal(x(*+i)), we have 

E I xW] > ^^^^-^"("^'^^ 
' en 

and thus 

E[TbinVal] < en(l + lnE [BinVal(x(°))] j = en(l + ln ))• 

Note, that the previous inequality gives us only a quadratic upper bound 
of 0{n'^) for the expected optimization time of the (1+1) EA on BinVal. 
However, it is known that for all linear functions — including BinVal — the 
expected optimization time of the (1+1) EA is O(nlnn). We discuss this 
in the following subsections and give a simplified proof using multiplicative 
drift analysis. 

3.1 Linear Functions 

A classical test problem for the runtime analysis of randomized search heuris- 
tics is the minimization of linear functions. 

Let n G N be a positive integer. A function /: {0, 1}" — )• M on n bits is 
linear, if there exists weights wi, . . . Wn £ K such that 

n 

f{x) = "^WiXi 
i=l 

for all X E {0, l}*^. In [DJW02] it has been argued and it is easily seen that 
in the analysis of upper bounds of the expected optimization time of the 
(1+1) EA on linear functions we may assume without loss of generality that 
the weights Wi are all positive and sorted, that is, 

< -wi < u;2 < • • • < Wn- (10) 

We simply call such weights monotone. Moreover, for the runtime bounds we 
consider in this work it does not matter whether the (1+1) EA minimizes or 
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maximizes the linear function. This is true since maximizing a function / is 
equivalent to minimizing — / and vice versa (for —/we again have to invoke 
above argument which allows us to assume monotonicity of the weights). 

Thus, from now on, we suppose that every linear function satisfies con- 
dition (10). Furthermore, we formulate all results for the minimization 
problem, even if the referenced results originally considered the problem 
of maximizing linear functions. 

We have already seen two prominent examples of linear functions, namely 
the functions OneMax and BinVal. When minimizing OneMax, the 
(1+1) EA accepts a new bit string in the selection step (Step 4) if the 
number of one-bits did not increase. In contrast, when minimizing BinVal, 
the inequality 2^ > Yli=i 2*"^ imphes that the (1+1) EA accepts a new bit 
string if and only if the highest-index bit that is touched in the mutation 
step (Step 3) is flipped from one to zero. 

In spite of this difference in behavior, Droste, Jansen and Wegener 
showed in their seminal paper [DJW02] that for all linear functions the 
expected optimization time of the (1+1) EA is ©(nlogn). 

Theorem 4 ([DJW02]). For all positive integers n G N, the expected run- 
ning time of the (1+1) EA on the class of linear functions with non-zero 
weights is B (nlogn). 

The proof of Droste, Jansen and Wegener applies a level based argument 
to the potential function (called artificial fitness function) g: {0, l}" M 
such that for ah x € {0, l}" 

1-2" J n 
i=l i=LiJ + l 

A much easier proof avoiding partitioning arguments and instead work- 
ing completely in the framework of drift analysis, was given by He and Yao 
in [HY04]. There, additive drift analysis is applied to the potential function 
g: {0,1}" R such that 

g{x) = In (l + ^ + ^^i) (12) 

i=l i=LtJ+l 

for all X G {0, 1}". For this function, with 1 < c < 2 chosen arbitrarily, they 
show that for all x G {0, 1}" \ {(0, . . . , 0)} 

E [^(xW) - fi(x(*+^)) I = x] = n{l/n). 

Afterwards, they apply Theorem 1 to show Theorem 4. However, while 
this approach strongly reduced the complexity of the proof in [DJW02], 
introducing the natural logarithm into the potential function still resulted 
in unnecessary case distinctions and even inconsistencies in an early version 
of the proof [HY01,HY02]. 
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3.2 The Drift for Linear Functions is Multiplicative 

In this subsection, we give a simple proof of the fact that the (1+1) EA 
optimizes any hnear function in expected time O(nlogn). Our proof is 
based on the theorem of multiphcative drift (Theorem 3). Although proofs 
for Theorem 4 are known [DJW02,HY04, Jag08], we present this alternative 
approach to demonstrate the strength of the multiplicative version of the 
classical drift theorem. 

In order to apply Theorem 3 we need a suitable potential function. For 
this, we choose the function g: {0, 1}" — t- M such that^ 



Xi 



for all X G {0, 1}". This function defines the potential as the weighted 
distance of the current search point to the optimum (the all-zero string) in 
the search space. More precisely, it counts the number of one-bits, where 
each bit is assigned a weight between one and two, such that bits which have 
higher weight in the objective function / also have higher weight in g. 

We show that the drift of the (1+1) EA with respect to g is multiplica- 
tive, that is, that condition (5) holds. 

Lemma 5. Let n €N be a positive integer. Let f : {0, 1}" — t- M 6e a linear 
function with monotone weights and let g: {0,1}" — M 6e the potential 
function with g{x) = X]r=i(-'^ + i/fijXi for all x G {0, 1}". 

Let X G {0, 1}" and let y G {0, 1}" he randomly chosen by flipping each 
bit in X with probability 1/n. Let A(x) := g{x) — g{y) if f{y) < f{x) and 
A(a;) = otherwise. Then 

E[A(x)] > 

4en 

This lemma implies that at every point in the search space the drift is 
at least linear in the current potential value. Thus, the multiplicative drift 
condition (5) holds and Theorem 4 follows directly by applying Theorem 3. 

Proof of Lemma 5. Since E[A(x) | f{y) > f{x)] = 0, we have by the law of 
total expectation that 

E[A{x)] = E[g{x) - g{y) \ f{y) < f{x)] Pr[/(y) < f{x)]. (13) 

Let I = {i £ {l,...,n}: Xj = 1}. We may distinguish three events 
(cases). 



^We might as well perform our analysis of g as defined in (11). However, our choice 
of g does not make the somewhat artificial binary distinction between bits with high and 
low indices and, thus, seems to be more natural. 
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(Ci) There is no index i G / such that Ui = and f{y) < f{x) holds, that 
is, X = y. 

(C2) There is exactly one index i £ I such that yi = and f{y) < f{x) 
holds. 

(C3) There are at least two different indices j,i £ I such that yj = and 
ye = and f{y) < f{x) holds. 

The only possibility for the event (Ci) to hold is if x = y. Therefore, 

Eb(x)-5(y)|(Ci)] = 0. (14) 
Next, suppose the event (C3) holds. By linearity of expectation, we have 

n 

E[g{x)-g{y) \ (C3)] = E[g(:E,) - | (C3)]. 

i=l 

On the one hand, the event (C3) implies that there are (at least) two 
indices j and i in {l,...,n} for which xj = xi = 1 and yj = ye = 1- 
Since gj > 1 and ge > 1, we have 

Y;^E[g{x,)-g{y,)\{Cs)]>2. 

On the other hand, if i G {1, . . . , n} \ / then 

Eb(x.) - g{yi) I (C3)] = -9^Pr[yi = | (C3)] > 

n 

since the condition (C3) does not increase the probability of 1/n that the 
yi = 0. Therefore, since the gt's are at most two, we have 

E{g{x) - g{y) \ (C3)] > 2 - - ^ > 0. (15) 

Therefore, by the law of total expectation and by (13), (14) and (15), 
we have 

E[A(x)] > Eb(x) - 5(y) I (C2)] Pr[(C2)] (16) 

and can focus on the event (C2). 

Suppose that (C2) holds. For every i £ I, we distinguish two events: 

(Ai) The i-th bit is the only one-bit in x that flips, none of the zero-bits at 
the positions larger than i flips, and f{y) < f{x) holds. 

(Bi) The i-th bit is the only one-bit in x that flips, at least one of the 
zero-bits at the positions larger than i that flips, and f{y) < f{x) 
holds. 
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We substitute the right side in (16) and obtain 

E[A(x)] > E[A(x) I (A,)] Pr[(yl,)] + E[A(x) | {B,)] Pr[(i?,)] (17) 

Let i £ I and suppose that the condition (ylj) holds. Then we have 
Ui = and yj = Xj for ah j > i. For a lower bound on E[A(2;) | (Aj)], 
we may suppose that xj = for all j < i and that every flip of a bit with 
index j < i is accepted. Therefore, since i < n 

E Ax (A,)] >i + --Y^-(i + l)=i + > 1 - 

i=i 

and thus E[A(x) | [Ai)] is positive. Furthermore, Pr[(Aj)] > ^(l — ^)"' ^ 
which is the probability that only the i-ih. bit flips. Hence, 

E[A(x) I {A,)]Vt[{A,)] > i(l - -Y'\l - (18) 

n\ n/ V Zn / 

Next, suppose that condition (i?j) holds. Then we have yi = and yi = 1 
for all ^ G / \ {i}, and there exists a j > i with j ^ I such that = 1. In 
order to satisfy f{y) < f{x), wj = Wi has to hold. This implies Xi = y^ for 
all a. G {1, . . . , n} \ {i, j}. To see this, recall that the wis are monotone and 
we condition on the event that the i-th bit is the only bit that flips from one 
to zero. 

Let J{i) = {j e {i + 1, . . . , n} : = and Wj = Wi\. For j € J{i) 
let Bi^j be the event that yi = 0, yj = 1, and ye = xg for £ not i or j. Then 

E[A(x) I {Bi)]P4{Bi)] = J2 E[A(x) | i?.,,] Pr[i?,,,]. 

We substitute E[A(x) | B,j] = and Pr[Sij] = ^(l - ^)""^ in the 

previous equation. Since these conditional expectations are always negative, 
we may pessimistically assume that J{i) = {i + 1, . . . ,n} and get 

^ E[A(.) I B,,] = - > -(l - lY^^^ 

and therefore 

E[A(x) I {B,)]VT[{Bi)] > _lf^_ly-'n + l-z_ ^^^^ 

Finally, we substitute (18) and (19) in (17) and derive 

E[A(x)] > -(l--rVl- — -"^'"^ = 
^ n\ nJ ^ 2n 2n n\ nJ An ^ 

iei i&I 
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Since gi = 1 + i/n < 2 ioi all i £ I, we have J2iei ^ — di^) therefore 

E[A(x)] > ^ 
4en 

which concludes the proof of the lemma. 



□ 



3.3 Distribution-based Versus Point-wise Drift 

In this subsection we show an almost tight upper bound on the expected 
optimization time of the (1+1) EA on linear functions. 

If we take a closer look at Lemma 5, we see that it holds point-wise, that 
is, it guarantees 

E - g{x'+^) I x« = x]>^ (20) 

for all X £ {0, 1}"\{(0, . . . , 0)}. This is far stronger than the positive average 
drift condition (5) which only requires 

E [g{x^'^) - g{x'+^) I 5(xW) = s,T>t]>6s (21) 

for all s G M such that Pr[g(x(*)) = s,T > t] > 0. 

The advantage of the stronger point-wise drift assumption is that it 
immediately guarantees that the result of Theorem 4 holds for all initial 
individuals. 

The main reason, however, for not using the weaker condition (21) is that 
this requires a deeper understanding of the probability distribution of x^^\ 

Let us stress that finding a potential function satisfying the stronger 
point-wise drift condition is usually very tricky. For example, one may 
ask why not take OneMax(x) as potential function to bound the expected 
optimization time of the (1+1) EA for minimizing linear functions. 

However, an easy observation reveals that there is an objective function / 
and a search point x such that g yields to small a drift with respect to /. 
To see this, let x = {xi, . . . , Xn) := (0, . . . , 0, 1) and let / := BinVal be the 
function to be minimized. Then the point-wise drift (20) with respect to 
OneMax is only l/n?. This example shows that finding a potential function 
yielding point- wise drift for all x and all / may be difficult. This observation 
is not to be confused with that in the discussion following (9). There, we 
determined the drift using the function BinVal itself as potential. Here, we 
use OneMax, that is, the 1-norm as potential function. 

Jagerskiipper [Jag08] was the first to overcome the difficulties of point- 
wise drift. While he still avoids completely analyzing the actual distribution 
of he does show the following property of this distribution which in 
turn allows him to use an average drift approach. In this way, he omits the 
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need for point-wise drift. Jagerskiipper's simple observation is that at any 
time step t, the more valuable bits are more likely to be in the right setting 
(cf. Theorem 1 in [Jag08]). 

Theorem 6 ([Jag08]). Let n G N 5e a positive integer and let x^*) denote 
the random individual (distributed over {0, after t S N iterations of the 
(1+1) EA minimizing a linear function f : {0, 1}" — )■ M. Then 

Pr[xP = 0] < ••• < Pr[xW = 0]. 

Moreover, for all k G {0, . . . ,n}, this statement remains true if we condition 
on ONEMAx(a;) = k. 

Using this theorem, he was able to show a lower bound of 0,(1 /n) for the 
drift of OneMax as potential function for any linear function. 

Lemma 7 ([Jag08]). Let n £N be a positive integer and let f : {0, 1}" — t- N 
be a linear function. Let x^*) be the individual in the t-th iteration of the 
(1+1) EA minimizing f. Then 

E[OneMax(xW) - ONEMAx(j;(*+^h I OneMax(xW) = k]> iiH^. 

en 

holds for all k £ {0, . . . , n} and t £N. 

In addition to a more natural proof of the 0(n ln(n)) bound for ex- 
pected optimization time of the (H-l) EA minimizing a linear function, 
Jagerskiipper was able to give a meaningful upper bound on the leading 
constant (cf. Theorem 2 in [JagOB]). 

Theorem 8 ([JagOB]). For all positive integers n G N, the expected opti- 
mization time of the (1+1) EA minimizing a linear function on n bits is at 
most of order (1 + o(l))2.02enln(n). 

Using multiplicative drift analysis (Theorem 3) on the result of Lemma 7 
and thus replacing the halving argument employed by Jagerskiipper for the 
proof of Theorem 8, the constant of 2.02e in the upper bound of the previous 
theorem instantly improves to 1.39e. In the light of our lower bound of l.OOe, 
to be proven in the next subsection, this is a considerable progress. 

Theorem 9. For all positive integers n G N, the expected optimization time 
of the (1+1) EA minimizing a linear function on n bits is at most of order 
(1 + o{l))^n\ii{n) ^{1 + o(l))1.39enln(n). 
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3.4 The (1+1) EA Optimizes OneMax Faster than any Func- 
tion with a Unique Global Optimum 

In this section, we show that the expected optimization time of (1+1) EA 
on any pseudo-Boolean function with a unique global optimum is at least as 
large as its expected optimization time on the basic function OneMax. In 
particular, this is true for every linear function with non-zero coefficients. 

In other words, if a function is easier to optimize than OneMax, then 
this can only be due to the fact that it has more than one global optimum. 
The general lower bound then follows from the following theorem by Doerr, 
Fouz and Witt [DFWIO], which provides a lower bound for OneMax. 

Theorem 10 ([DFWIO]). For all positive integers n £ N, the expected 
optimization time of the (1+1) EA minimizing OneMax on n hits is at 
least (1 — o(l))enln(n). 

Thus, it remains to show that OneMax is optimized fastest. The result 
itself was announced by Scheder and Welzl [S W08] . Their idea to prove this 
statement, however, differs from the one given below. 

Theorem 11. Let n G N be a positive integer. The expected optimiza- 
tion time of the (1+1) EA on any function f : {0,1}" — >• M that has a 
unique global optimum is as least as large as its expected optimization time 
on OneMax. 

The theorem can be formalized as follows: Let / be a function with a 
unique global optimum. Let {x^^^j^gN be the search points generated by 
the (1+1) EA minimizing /. Let Tf := min{t G N | /(x^*)) = 0} be the 
optimization time of the (1+1) EA on /. Then 'E[Tf] > E[ToneMax]- 

Theorems 10 and 11 immediately yield the following. 

Corollary 12. For all positive integers n G N, the expected optimization 
time of the (1+1) EA minimizing a function with a unique global optimum 
on n bits is at least (1 — o(l))enln(n). 

For the proof of Theorem 11 we first show a preliminary lemma. It 
formalizes the following intuition. Let x and x be two search points such 
that |x|i < \x\i. Then the probability that the (1+1) EA samples a new 
search point with exactly j < \x\i one-bits from x is at least as big as from x. 

Lemma 13. Let n G N with n> 1. Let x,x {0, 1}" with \x\i < \x\i. Let y 
and y two random points in {0, 1}" obtained from x and x by independently 
flipping with probability 1/n each bit of x and x, respectively. 
Then for every j G {0, . . . , — 1}, 

Pr[|?/|i = j] >Pr[|y|i = j]. 
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Proof. Let k := \x\i. The lemma holds trivially if = k. 
Suppose that \x\i = k + 1. Then 




As all summands in the previous two equations are positive, it suffices to 
see that the quotient 

/ k \/n-k\/ 1/n \k-j+2i 

i )(t^) ^ {k + l-j + i){n-k)in-l) 

(k+i\ (n-k-i\ / J>_\fc+i-i+2i ~ (^k + l){n - k - i) 

\j—iJ \ i J \ 1— 1/n / 

is minimal for i = and j = k — 1 and therefore at least 1 for all values 
< i < min{j, n — k — 1}. 

Thus, for = k + 1 the lemma also holds. Finally, for \x\i > A; + 1, 
the lemma follows by induction based on the case = A; + 1. □ 

To prove the main result of this section. Theorem 11, we need some 
additional notation. Let / be a function with a unique global optimum x*. 
Without loss of generality, we may assume that x* := (0, . . . , 0) is the unique 
minimum of /. This is justified by the observation that the (1+1) EA treats 
the bit-values and 1 symmetrically, that is, we might reinterpret one-bits 
in X* as zero-bits without changing the behavior of the algorithm. 

Let fi{x) := E[ToneMax | x^'^^ = x] and Jl{x) := E[rj | x^'^^ = x] be the 
expected optimization times of the (1+1) EA starting in the point x and 
minimizing OneMax and /, respectively. 

For every /c G {0 . . . , n} let 

Hk '■= mm{fi{x) I X G {0, 1}", = A;} 

be the optimization time of the (1+1) EA optimizing OneMax starting in 
a point with exactly k one-bits. 
Furthermore, let 

Jlk '■= iam{'jl{x) I X G {0, 1}", > k} 

be the minimum optimization time of the (1+1) EA minimizing / and start- 
ing in a point x with at least k one-bits (note the difference to ^i^)- 

Note that, due to the symmetry of the function OneMax, /.i^ = /u(x) 
for every x G {0, 1}" with exactly k one-bits. 
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Proof of Theorem 11. We inductively show for all k that Hk ^V-k- Clearly 
jjLQ = Q = JIq. Therefore, let /c G {0, . . . , n — 1} and suppose that fii < Jli for 
all i < k. 

Let X G {0, l}*^ with = k + 1 he arbitrary and let y G {0, l}" 
be a random point generated by flipping each bit in x independently with 
probability 1/n. 

The (1+1) EA minimizing OneMax and starting in x accepts y in the 
selection step (Step 4) if and only if |y|i < |x|i. Furthermore, we have 
fi{x) = fi{y) if \y\i = \x\i = k + 1. Thus, 

k 

fi{x) = l + /i(x)Pr[|y|i > A; + l] + ^E[fi{y) \ \y\i = j] Pr[|y|i = j] 
and therefore 

k 

Hk+i = l+;Ufc+iPr[|y|i > A;+l] + ^ ^^^-Pr [|y|i = j] . (22) 

Next, let X G {0, 1}" be chosen arbitrarily such that > k + 1 and 
Jik+i = J^ix)- Furthermore, let y G {0,1}"' be a random point generated 
by flipping each bit in x independently with probability 1/n. Let z = y 
if fiy) < fix) and z = x otherwise. 

Then 

Jl{x) = 1 + E [Jl{z) I \z\i > A; + l] Pr[|5|i > A; + l] 

k 

+ ^E[/I(z)||i|i =i] Pr[|z|i=j] 
j=0 

and therefore, by definition of Jij, 

k 

Jik+i > l + /Ifc+iPr[|z|i >k + l] +^/IjPr[|z|i = j]. (23) 

Now, for all < j < A;, we have 

Pr[|z|i =i] <Pr[|y|i =i] <Pr[|y|i =i]. 

The first inequality holds, since the event \z\i = j implies the event \y\i = j. 
The second inequality follows from Lemma 13, since |2;|i = /c + l<|x|i. 

Considering this relation and the fact that the Jli are monotonically 
increasing in i, we obtain from (23) that 

k 

Jlk+i > 1 + /"fc+iPr[|y|i > A; + l] +^/IjPr[|y|i = j] . 

j=0 
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Therefore, the induction hypothesis yields that 

k 

Jik+i > 1 + /Ufc+iPr[|y|i > A; + l] +^//jPr[|y|i = j] . 

j=0 

We subtract both sides of equation (22) from the previous inequahty and 
immediately get Jik+i ^ A'fc+i which concludes the induction. 
Thus, for all x £ {0, l}'^, we have 

Consequently, E[ToneMax] < E[Tj] holds. □ 

4 Multiplicative Drift on Combinatorial Problems 

So far, we have seen that multiplicative drift analysis can be used to simplify 
the runtime analysis of the (1+1) EA on linear pseudo-Boolean functions 
while producing sharper bounds. In this section, we see that optimization 
processes with multiplicative drift occur quite naturally in combinatorial 
optimization, too. We demonstrate this claim on two prominent examples, 
the minimum spanning tree problem and the single source shortest path 
problem. 

4.1 The Minimum Spanning Tree Problem 

In this subsection, we consider the minimum spanning tree (MST) problem 
analyzed in [NW07]. Let G = {V, E) be a connected graph with n vertices, m 
edges ei, . . . , em, and positive integer edge weights wi, . . . , Wm- In [NW07], a 
spanning tree is represented by a bit string x G {0, l}*" with Xj = 1 marking 
the presence of the edge Cj in the tree. 

The fitness value of such a tree is defined by w{x) = X^I^i "u^i^i + p{x), 
with p{x) being a penalty term ensuring that once the (1+1) EA has found 
a spanning tree it does no longer accept bit strings that do not represent 
spanning trees (a new bit-strings is accepted if the fitness value decreases). 
The minimum weight of a spanning tree is denoted by Wo^t and the maximal 
edge weight by tWmax- 

In Lemma 1 of [NW07], Neumann and Wegener derive from [Kan87] the 
following statement. 

Lemma 14 ([NW07]). Let x G {0, 1}"^ he a search point describing a non- 
minimum spanning tree. Then there exist ak G {1, . . . , n—1} and k different 
accepted 2-bit flips such that the average weight decrease of these flips is at 
least {w{x) — Wopt)/k. 
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Multiplicative drift analysis now gives us a reasonably small constant in 
the upper bound of the expected optimization time of the (1+1) EA on the 
MST problem. 

Theorem 15. The expected optimization time of the (1+1) EA on the MST 
problem starting with an arbitrary spanning tree of a non-empty graph is at 
most 2em^(l + Inm + \nwmax)- 

Proof. For all t G N, let x^*^ be the search point of the (1+1) EA for the 
MST problem at time t and let X^*^ = w{x^^'^) — Wopt- Then 

Now, let t G N and x G {0, 1}" \ {(0, . . . , 0)} be fixed. Let the points 
?/(!), . . . , with G {0, . . . , n — 1} be the k distinct search points in {0, 1}™ 
generated from x by the k different 2-bit flips according to Lemma 14. That 
is, we have w{y(^i^) < w{x) for alH G {1, . . . , k} and 

k 

(/(^) - f(y^)) ^ ^(^) - ^opt. (24) 

i=l 

Since the y(i)'s are each generated from x by a 2-bit flip, we have 

Pr[.<'«)=„„|.<')=.] = (l-i)-^(i)^ (25) 

for all i G {1, . . . ,k}. Furthermore 

E [XW - I = X, = y(i)] = w{x) - w{y(^i)) (26) 

holds for all i G {1, . . . , k}. 

The (1+1) EA never increases the current if- value of a search point, that 
IS, is non-negative. Thus, we have by (25) and (26) that 

E [x"> - x('«> I .<«) =.]>± (.(.) - Mm,)) {^-^f'\ky 

i=l 

and therefore, by inequality (24), we have for all x G {0, 1}"^ that 
E [XW - I xW = x] > ^^"^ 

In other words, 

E[XW-X(*+1)|XW]>^ 

em^ 

and the theorem follows from the Theorem 3 with 1 < X^^^ < mtt^max- D 



21 



4.2 The Single-source Shortest Path Problem 

In [BBD+09], Baswana, Biswas, Doerr, Friedrich, Kurur, and Neumann 
study an evolutionary algorithm that solves the single-source shortest path 
(SSSP) problem on a directed graph with n vertices via evolving a shortest- 
path tree. In the analysis of the upper bound for the expected optimization 
time, the authors introduce the gap gi in iteration i as the difference in 
fitness between the current shortest-path tree candidate and an optimal 
shortest-path tree. 

For every vertex in the tree, its weight in the tree is defined as the sum 
over the weights of edges in the paths leading to the root vertex, or as the 
penalty term nwmax if the vertex is not connected to the root. The fitness of 
a shortest-path tree candidate is then the sum over the weights of all vertices 
in the tree. Thus the maximal gap is n^Wya^x- In Lemma 1 of [BBD+09], 
the authors then provide the following statement. 

Lemma 16 ([BBD"''09]). Let gi denote the gap after i mutations. Then it 
holds for the conditional expectation E[g'j+i \ gi = g] that 



To this, we can directly apply Theorem 3, taking the gap as a potential. 
We obtain the following result with a precise constant for the upper bound. 

Theorem 17. The expected optimization time of the (1+1) EA in [BBD^ 09] 
on the SSSP problem starting with an arbitrary shortest-path tree candidate 
is at most 6?7-'^(l -|- 21nn -|- In Wmax)- 

4.3 The Euler Tour Problem 

The Euler tour problem is to find a Euler tour (a closed walk that visits 
every edge exactly once) in an input graph which permits such a tour. 

In [DJ07], possible variants of the (H-l) EA for the Euler tour problem 
are analyzed. For the variant using the so-called edge-based distribution on 
cycle covers, the search space is given by adjacency list matchings, where 
each matching represents a cover of the input graph with edge-disjoint cycles. 
The fitness of a matching is given by the total number of cycles in the cover. 
Thus, a fitness of one implies that the graph is covered by a single cycle — 
an Euler tour. 

Finding such a tour is then a minimization problem over this search 
space. For this setup, the following statement is implicitly shown in the 
proof of Theorem 3. 

Lemma 18. In a single iteration of the (1+1) EA in [D J 07] for the Euler 
tour problem using the edge-based distribution and starting with an arbitrary 
cycle cover, the probability to decrease the fitness f{x) of the current search 
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point X by one (provided it was not minimal before) is at least f{x)/em 
where m is the number of edges of the input graph. 

If we set the fitness minus one as potential, this lemma immediately 
implies that the expected drift is at least f{x)/em. Moreover, the starting 
potential is at most m/3 (each tour has hat least three edges). Again, we can 
apply Theorem 3 and reproduce the upper bound the expected optimization 
time, specifying the leading constant in the process. 

Theorem 19. The expected optimization time of the (1+1) EA in [DJ07] 
for the Euler tour problem using the edge-based distribution and starting with 
an arbitrary cycle cover is at most em In m, where m is the number of edges 
in the input graph. 

5 Discussion and Outlook 

In this work, we showed that the multiplicative drift condition (5) occurs 
naturally in the runtime analysis of the (1+1) EA for number of prominent 
optimization problems (linear functions, minimum spanning trees, shortest 
paths, and Euler tours). In such situations our multiplicative drift theorem 
(Theorem 5) yields good runtime bounds. 

We applied this new tool to various settings. First, we used it to gain 
new insight in the classical problem of how the (1+1) EA optimizes linear 
functions. 

We presented a simplified proof of the, by now, well-known fact that 
the (1+1) EA with mutation probability 1/n optimizes any linear function 
in time 0(n log n). Moreover, we applied our result to the distribution- 
based drift analysis of Jagerskiipper and obtained a new upper bound of 
(1 + o(l))1.39enln(n) for the expected optimization time of the (1+1) EA 
for arbitrary linear functions. 

We complement this upper bound by a lower bound of (1 — o(l))enln(n). 
To do so, we showed that OneMax is the function easiest optimized by the 
(1+1) EA. By this we extended a recent lower bound of (1 — o(l))enln(n) 
for the expected optimization time on OneMax to all functions having a 
unique global optimum. 

Our upper and lower bounds for the expected optimization times of the 
(1+1) EA on arbitrary linear functions are relatively close. This raises the 
question if possibly all linear functions have the same expected optimization 
time of (1 + o(l))enln(n). 

Finally, we reviewed previous runtime analyses of the (1+1) EA on the 
combinatorial problems of finding a minimum spanning tree, shortest path 
tree, or Euler tour in a graph. For all three cases, we exhibited the ap- 
pearance of multiplicative drift and determined the leading constants in the 
bounds of the expected optimization times. 
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In the light of these natural occurrences of multiplicative drift, we are 
optimistic to see applications of multiplicative drift analysis in the near 
future. 
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Note Added in Proof 

Recently, Doerr and Goldberg [DGIO] have shown that in Theorem 3, the 
stopping time T is with high probability at most of the same order as the 
upper bound on its expectation given in inequality (5), if X^^^ is at least 
Q{n). Thus, the implicit upper bound given in Theorem 9 and the bounds 
in Theorem 15 and Theorem 17 also hold with high probability, if we allow 
a slightly larger leading constant. 
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